• M' IN THE UNITED STATES PATENT TRADEMARK OFFICE 

In re Application of Atty. Docket No 




RENAT VAFIN ET AL. 

Serial No. : 10/003, 05: 

Filed: NOVEMBER 2, 2 0 0 1 ^""^^2591^ 

Title^: CODING SIGNALS 

Commissioner for Patents 
Washington, D.C, 20231 



NL 010529 




RECEIVED 

Group Art Unit: 2631 » i-t-f 

FEB 0 5 2002 
Technology Center 2600 



CLAIM FOR PRIORITY 



Sir: 



Certified copies of the EUROPEAN Application No's. 
01202826.2, filed on July 25, 2001; 01201627.5, filed on May 3, 
2001 and 01201570.7 filed on April 7, 2001 referred to in the 
Declaration of the above -identified application is attached 
herewith. 

Applicants claim the benefit of the filing date of said 
EUROPEAN applications. 

Respectfully submitted, 



January 8, 2002 
Enclosure 





By_ 

JacW Id. Slobod, Reg. 26, 236 
Atxorney 
(914) 333-9606 



CERTIFICATE OF MAILING 
It is hereby certified that this correspondence is being deposited with the 
United States Postal Service as first-class mail in an envelope addressed to: 
COMMISSIONER FOR PATENTS 




Europaisches 
Patentamt 



Eur pean 
Patent Office 



Office eur peen 
des brevets 




Beschei n i g u n g Certif i cate 



Attestation 



Die angehefteten Unter la- 
gen stimmen mit der 
ursprunglich eingereichten 
Fassurig der auf dem nach- 
sten Blatt bezeichneten 
europaischen Patentanmel- 
dung uberein. 



The attached documents Les docunr^ents fixes a 
are exact copies of the cette attestation sont 
European patent application conformes a la version 
described on the following initialennent d^posee de 
page, as originally filed. la demande de brevet 

europeen sp6cifl6e St la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n° 

01202826.2 



Der Prasldent des Europaischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 

Le President de I'Office europeen des brevets 
p.o. 




I.L.C. HATTEN-HECKi^AN 



DEN HAAG,DEN 
THE HAGUE, 
LA HAYE,LE 



30/10/01 



EPA/EPO/OEB Form 1014 -0251 



Eur paisches European Office eur peen 

Patentamt Patent Office des brevets 



Blatt 2 der Bescheinigung 
Sheet 2 of the certificate 
Page 2 de I'attestation 



Anmetdung Nr.: Anmeldetag: 

Application no.: 01202826. 2 Date of filing: 25/07 /Ol 

Demande n*: Date de depot: 

Anmelder 

Appllcant(s): 

Demandetjr(s}: 

Konlnklljke Philips Electronics N.V. 

5621 BA Eindhoven 

NETHERLANDS 



Bezeichnung der Erfindung: 
Title of the invention: 
Titre de I'inventlon: 

Coding signals 



In Anspruch genommene Pnoriat(en) / Phority(ies} clainried / Priorlte(s} revendtquee(s) 

Staat: Tag: Aktenzeichen: 

State: Date: RIe no. 

Pays: Date: Numero de depot: 



Internationale Patentklassifikation: 
International Patent classification: 
Classification Internationale des brevets: 

/ 



Am Anmeldetag lienannte Vert rag sta ate n; 

Contracting states designated at date of filing: AT/BE/CH/CY/DE/DK/ES/FI/FR/GB/GR/IE/IT/LI/LU/MC/NLyPT/SE/TR 
Etats contractants designes lors du depot: 

Bemerkungen: 

Remarks: 

Remarques: 



EPA/EPO/OEB Form 1012 -11.00 



23.07.2001 



Coding signals 



This invention relates to method of coding signals and to apparatus for storing, 
transmitdng, receiving or reproducing signals. 



S A common method of storing audio signals is to nse parametric oodii^ to 

represent audio signals, especially at very low bit rates, typically in the re^on from 6 Icbps to 
90 kbps. Examples of fhe use of parametric coding used in this way are included in *^Low bit 
rate high quality audio coding with combined harmonic and wavelet representation" in 
Proceedings of tiie IEEE tntemational Conference on Acoustics, Speech and Signal 

10 Processing, Volume 2, pp 1045 to 1048, 1996; "Advances in Parametric Audio Coding*' in 
Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and 
Acoustics, pp W99-1-W99-4, 1999; and "A 6 kbps to 85 kbps scalable audio coder** in 
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal 
Processing, Volume H, pp 877-880, 2000. In these examples, a parametric audio coder is 

1 5 described, in which an audio signal is represented by a model, with paramet^s of the model 
being estimated and encoded. These examples use a parametric representation of an audio 
signal based on decomposition of an original signal into three componwts: a transient 
component, a tonal (sinusoidal) component, and a noise component. Each compoxient is 
represented by a corresponding set of parameters, as described in the three documents above, 

20 A transient component of an audio signal can be characterized as an isolated element of the 
audio signal yMoti is relatively short lived, and is represented by a sharp increase in energy 
of the audio signal. 

It has been foxmd that having a dedicated model for the transient component of 
an audio signal proves to be beneficial for parts of audio signals with sharp attacks, because 
25 amusoidal and noise models cannot easily represent such perceptually important events and 
poor modeling can result in audible artifacte such as a pre-echo. A pre-echo occurs when the 
modeling error distributes tiie transient event to the samples before -flie transient beginning 
and when the resulted distortion is large enough to become audible. The distribution of the 
modeling error to the samples before the transient beghming results firom the segmentrby- 
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segment analysis of an input signal in an audio coder. If a transient occurs in the niiddle of 
an analysis segment, then either a lot f coding resources are required in order to accurately 
model the transient, or the modeling enor distributes to the whole analysis segment 
Modeling error of tlie samples preceding a transient is typically perceptually more apparent 
5 than at samples after the transient, because of a weaker maskmg &om tlie transient event 
itself. 

In '^Residual modeling in music analysis^syntfaesis*' from Proceedings of the 
IBBB Intexnadonal Conference on Acoustics, Speech and Signal Processing, Volume 2, pp 
100S«100S, 1996 it is shown that transient components cannot satis&ctorily be repres^ited 

10 by sinusoidal and noise models alone. 

It has been shown previously in '*Robust exponential modeling of audio 
signals" fiom Proceedings of the IEEE International Conference on Acoustics, Speech and 
Signal Processing, Volume 6, pp 3581-^3584, 1998, that transients can be modeled efficiently 
using sinusoids with CKponentially modulated amplitudes (refeired to below as damped 

1 S shmsoids). In the text below damping coefficients can be any real number, and positive 
values correspond to increasing amplitudes rather than to truly decreasing amplitudes. In 
*^Robust exponential modeling of audio signals'* (see above) an audio signal was analyzed on 
a segment-by-segment basis and each segment was represented as a sum of damped 
sbiusoids. A problem arises with this type of coding wbsn a transient starts in the middle of a 

20 given segment. Compared to the case where transient starts in beginning of a segment, 
the nimiber of damped sinusoids needed to model the transient well increases considerably. 
If a transient is not modeled properly^ the modeling error is distributed over the whole of a 
given segment resulting in audible pre-echoes. 

In the MPEG-1 Layer HI audio coding algorithm* as described in "ISO- 

25 MPEG-1 Audio: a generic standard for coding of high-quality digital audio** in the Journal of 
the Audio Engineering Society, Volume 42» pp 780-792, October 1994. The segmentation is 
defined simply by the lengths of the long and short windows. 



30 It is an object of the present invention to address the above mentioned 

disadvantages. To tliis end the invention provides a method of coding and an apparatus for 
coding as defined in the independent clahns. Advantageous embodiments are defined in Hxe 
dependent clahns. 
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According to a first aspect of the present invention the coding of an input 
signal comprises: 

- estimating a location of at least one transients in a time segment of the input signal; 

5 - modiQiog flie location of the transient so that the or each transient occurs at a specified 
location on a predetermined time scale to obtain a modified signal; and 

- modeling the modified signal. 

The use of restricted time segm^tation in the form of a specified location on a 
predetermined time scale to provide the only locations for the transients advantageously 
10 reduces the number of bits needed to describe the segmentation. Also the modification 

procedure has lower computational cost compared to a full precision segmratation procedure. 

Bach transient is preferably re-located to a nearest specified location of a 
plurality of possible locations on the predetermined time scale. 

The specified locations on the predetermined time scale may be defined by 
15 integer multiples of a predetennined minimum time segment size. The predetermined 

mmimum time segment size may b^ve a length in the range of approxixnately 1 millisecond 
(ms) to approximately 9 ms. most preferably in the range of approximately 4 ms to 
approximately 6 ms. 

The use of a restricted time segmentation as described advantageously 
20 simplifies the modeling procedure significantly, if rate-distottion control is used to distribute 
coding resources between transient, sinusoidal and noise components of the input signal 
bdng modeled. 

The modeling preferably uses damped sinusoids. 

The audio signal is preferably sampled at a rate of approximately 5 to SO kHz, 
25 most preferably 8, 16, 32, 44.1 or 48 kHz, The video signal is preferably sampled at a rate 
of approximately S to 20 MHz. 

The restricted time segmentation may also be applied to tonal and/or noise 
components of an input signal, 

The estimation of the location of transients may be carried outlying an 
30 energy-based approach, preferably with a moving window method, most preferably using 
two sliding windows. 

The use of an en^gy-based approach allows the advantageous estimation of 
both very short transients and longer transients. 
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4 23.07,2001 
The location of transients may involve the location of a beginning and an end 
of each transient 

Preferably each located transient is moved by a cut and paste method firom its 
original location to begin at a location on the predetermined time scale. 
5 The cut and paste method simply removes that part of the input signal 

identified as a transient and moves it to the new location. Thus the step is very simple to 
implement. 

A remaining section of the inpnt signal betvv^een two located and modified 
transients is preferably time-warped to fill the gap remaining following the relocation. The 
1 0 time-warp may be a lengthening or a shortwing of said remaining section. 

By using knowledge of sound perception, including pitch p^eption and 
temporal masldng effects, the ttme-watping is a simple method with which to restore the 
r^naitih:^ signal after modification of the traxisients. 

The time-warping preferably preserves the amplitudes of edge-pointe of the 
1 S modified signal, preferably by a band limited interpolation method, 

The time-watp is preferably carried out by interpolation where the change in 
the fundamental frequency, /o^ of the remaining section is less than approximately 0.3%, most 
preferably less than approximately 0.2%. 

Otherwise^ the remainmg section is preferably split hi to a first length 
20 inmiediately $ftet the modified transient and a second length. Preferably^ the first length is 
approximately 8 ms to 12 ms, most preferably approximately 1 0 ms. The first length is 
preferably interpolated if the change of fundamratal fi:equency caused is no more than 
approximatBly 1 .6% to 2.4%, most preferably no more than approximately 2%. For the 
second lengthy the change of fundamental frequency 13 preferably not more than about 0,16% 
25 to 0,24%, most preferably approximately 0,2%, 

Where the interpolation is insufficient to fill a gap in the remaining section an 
overlap-add procedure is preferably used. 

The modification of the location of the or each transient may b© performed 
using a transformation into a frequency domain, preferably with a discrete cosine transfotnu 
30 The resulting sinusoidal representation may then be analyzed for transient locations ming a 
Banning windoM^. Preferably, the Manning window has a length of approximately 512 
samples (where a sample has a length of one divided by a sampling frequency of the input 
signal), preferably with an ovei'lap between Hanning windows of 256 samples. 
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The input signal is preferably processed by dividing ihe input signal into a 
plurality of time segments- The time segments may have a length in iho range of 
approximately 0,5 s to 2 s, preferably a length of approximately 1 

Adjacent time segments axe preferably arranged to overlap, preferably by 
5 approximately 5% to approximately 15% of their length, more preferably the overlap is 
approximately 10% of the time segment length, which overlap may be fi5)proximately 0*1 s. 
Where a transient is located in an overlap of the adjacent time aegmente, &e transient 
location is modified in the time segment hi which the transient is most cenixally located. 

The provision of an overlap m adjacent time segments advantageously allows 
10 the selection of the time segment in which the transient is most centrally located, oi more 
importantly furthest &om the begionmg or end of the time segment. 

The invention extends to decoding audio or video signals coded according to 
the coding of ^e first aspect. 

An apparatus according to an embodiment of the invention may be an audio 
IS device, e,g. a solid state audio device. 

All of the features disclosed heroin can be combined with any of the above 
aspects, in any combination. 

Preferred embodiments of the invention of the invention provide coding 
signals which coding has a more sunplLfied analysis procedure than has previously been 
20 described^ coding signals %^ch coding has a lower computational cost than equivalent 

methods, and coding signals which coding results in a reduction of the number of bits needed 
to des^ibe a segmented signal. 

Additional side information may be included in the bitstream to dewaip the 
signal at the decoder side. Witib the appropriate dewarping, temporal misalignment of stereo 
2S signals can be avoided. 



Specific embodhnents of the present hivention will now be described, by way 
of example, and with reference to the accompanying drawirxgs, in which: 
30 Figure 1 shows the performance of a damped sinusoidal model in the case of a 

restricted segmentation of an audio signal for an original and a time shifted transient for a 
first embodiment; 

Figure 2 shows an original transient and its reconstmctlon with 25 damped 

sinusoids; 
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6 23.07.2001 
Figure 3 shows a time shifted transient and its reconstructioii with 25 damped 
sinusoids for the first embodiment; 

Figure 4 is a flow diagram of the steps involved in the method of coding audio 
signals in the first embodiment; 
5 Figure S is a diagrammatic illustration of &e modification of transient location 

m a second embodim^t; 

Figure 6 is a diagrammatic Illustration similar to that of Figure 5; 
Figure 7 shows an original transient and its reconstruction; 
Figure S shows a shifted transient and its reconstruction according to the 
10 second embodiment; 

Figure 9 is a flow diagram of the steps involved in the second embodiment; 

and 

Figure 10 is a schematic diagram of an audio encoder and an audio decoder 
Utilizing the methods described herein. 

15 



The first method disclosed h^ein, and as shown in Figure 4, uses a restricted 
time segmentation, in which segments of an audio signal are dejBned by integer multiples of a 
predefined minimum segment size, which ul the example used is S ms, but of course this 
20 could vary. In view of the restricted time segmentation tire transient component of the audio 
signal is modified such that transients can start only at the beginning of a segment The 
modified signal is then modeled, in this example by using damped sinusoids. This results la 
an efficient representation of transients with damped sinusoids. 

The coding of audio involves a first step of modifying the location of transiedit 
25 elements of the signal so that the transients can occur only at locations defined by a relatively 
coarse time grid, as described below in the discussion of experimental results. In order to 
modify the locations of transients in the audio signal the following steps are taken: 
1 . The transient component of an original audio signal is esthnated mid is subtracted from 
&e original audio signal to form a residual signal. 
30 2- The locations of the estimated transients are then modified in such a way that the 
transients can only occur at locations specified on a grid. 

During the transient estimation and modification! it has been verified that 
when the modified transient signal is added to the residual signal obtained in step 1 above, 
there is no perceptual difference between the obtained signal and the original audio signal. 
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la ord^ to modify the transient locations it is necessary to estimate the 
transient component of the original audio signal to be coded. It is possible to use different 
transient models in parametric coding of audio. One example which has been used is the 
transient model based on duality between the time and frequency domain presented in 
5 **Transient modeling synthesis: a flexible analysis/synthesis tool for transient signals'*, in 
Proceedings of the International Computer Music Conference, pp 25-30, 1997. 

In more detail, the transient estimation model presented in the above referenco 
is based on the duality between dxe time and the frequency domain. A delta impulse in the 
time domain corresponds to a sinusoid in tiie firequ^cy domain. Furthermore, a sharp 
1 0 transient in the time domain corresponds to a frequency domain signal which can be 

represented efficiently by a sum of sinusoids. More specifically^ the transients ate estinuited 
using the following steps. 

1 . A discrete cosine transform (DCT) is \ised to transform a time domain segment to the 
frequency domain. The segment size (equivalently, the DCT si2se) should be sufficiently 

1 S large Id ensure that a transient is a short event in time (thus, transformed to the frequracy 
domaini it can be modeled cfiTiciently by sinusoids). A block sisise of about 1 s has been 
found to be sufficient. 

2. The frequency doxnain (DCT domain) signal is analysed with a One 
example which has been used is a consistent iterative sinusoidal analysis/synthesis with 

20 Hanning-windowed sinusoids, as described in ^High quality consistent analysis-synthesis 
in shiusoidal coding", from Proceedings of the Audio Engineering Society 17* 
Conference "High quality audio coding", pp 244-250, 1999. 

The sinusoidal analysis of a DCT domain segment is done on a segment by 
segment basis. As a result, the DCT-domain segment is represented as 

/ = 0,...ti — Ijf 5=1,...,/ 

where L is &e length of the sinusoidal segments (the shift between sinusoidal segments is 
1/2), The length ofthe sinusoidal segments.!, is a small fraction of the DCT size,//. h(l) 
ate samples of the Hanning window, and {Atj, cofj, 0tj are amplitudes, frequencies and 
phases of the estimated sinusoids respectively. The index / denotes a particular sinusoidal 
30 segment within the DCT-domain segment, while the index^ denotes a particular sinusoid 
within the sinusoidal segment. The infonnation about the location of a transient in a time 
domain segment is contained in the frequency parameters of the corresponding sinusoids. A 
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transient in the beginning of a segment results in low sinusoidal fi:eq\iencies, while a transient 
in the end of tlae segment results in high sinusoidal frequencies, The frequency resolution of 
the sinusoidal model depends on the required resolution in estimation of transient locations. 
If the required time resolution is one sample then the required frequency resolution is defined 
5 by the reciprocal of the DOT size. 

Duo to the duality between the transient location in a time domain segment 
and the frequencies of frie corresponding sinusoids, the obvious way to modiJ^ the transient 
location is to modify the corresponding frequencies (plus a correction in the phase 
parameters). The transient location in the time domain segm^t is denoted by no and the 
1 0 closest allowed location from a time g3dd is denoted by A . Then the desired time shift is 
defined as 



An =5 «o — n 



(2) 



IS In order to modify the transient location by An the frequencies (Ofj and phases 

0\j corresponding to Hub transient should be modified as follows: 



lj-^.J^^(^^Hi-l)^) (4) 

20 

No modification of ampUtodes A^^ is needed. 

Note that the above procedure is different from independent quantization of 
sinusoidal parameters. All fi^uencies corresponding to one transient are modified by the 
same amount. Thb, together with the phase correction of equation (4) above^ ensures that the 

25 shape of the time domain transiexit is preserved, only the location is modified. 

Because the DCT size is relatively large at one second, more than one 
tratisient can occur in a time domain segiment. In this case, the model has to identify 
sinusoidal parameters corresponding to different transients. This is done by declaring close 
sinusoidal frequencies co/y to represent the same transient. Specifically, two sinusoids having 

30 frequencies differing by not more than 8© are declared to represent the same transient and two 
sinusoids having frequencies differing by more than 8^ are declared to represent different 
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transients. Then locations of all transients are mocHfied separately^ Below when reference is 
mad© to a group of ftequencies ayij reference is being naade to firequenoies corresponding to a 
particular transient. 

A transient can occur at the beginning or at the end of a time domain segment 

5 ti this case, the modification of sinusoidal frequencies can yield frequencies below 0 or 
above 7E. TMs results in the distortion of the shape of the tiniedonaaintransi To account 
for this, an overlap is allowed betwera. time domain segments (0.1 seconds). In this case a 
transient can appear hi two overlappit^ segments, i.e. in the region of mutual overlap. 
Because &e overlap is sufficicmtly large, if the transient is located v©ry close to a border of 

10 one of the overlapphag segments, then it is located at a safe distance from a border of the 
other segment. It is straightforward to Identify the transimt location from sinusoidal 
frequencies, and therefore it is easy knowing the estimated sinusoidal frequencies in the two 
overlapping segments to identify when a transient is represented in two segments. If audi a 
situation occuis» the corresponding sinusoids m fhe segm^t are cancelled where the 

IS transicdit is closer to the corresponding border. 

A typical transient lasts for more than one time sample. A natural question is 
then what is ^ location of no of the transient. After the modifrcation of location the 
corresponding sample of ti^ tnmsient will be placed at location n corresponding to the 
beginning of a segment defined by tiie time grid. TherejEbre, it is important that the estimated 

20 value no corresponds to the start of the transient. The time domain approach described below 
has proved to yield good results. First, the time samples nmm and nmax are identified 
corresponding to the frequency values min(a)ij) and max(my), where ODy are frequencies of 
sinusoids corresponding to a particular transient. Next, the highest amplitude of the 
estimated transient signal in the time interval [nmim ^xtml la found. Then, the start sample of 

25 the transient m is defrned to be the first sample in the mterval [njum nmtai] having amplitude 
higher than 10% of the highest amplitude. 

Typically, the estinsated transient component of an audio signal contains 
samples of small amplitudes before the sample no* Because the tune sample wo is declared to 
be tlie first sample of the transient and that no transient can occur at a distance defined by 8(d 

30 before the transient, the corresponding samples before «o are forced to have zero amplitude. 
As a result, those samples go to the residual signal with then: original amplitudes. 

Having estimated the location of transients and modifying their location as 
described above the modified signal can now be modeled to allow the signal to be coded. 
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A damped sinusoidal model is to model the modified signal, which aims 
at approximating a signal s witlx a sum of sinusoids vvith eTcponentially modulated amplitudes, 

2hi 



(5) 

5 where e C e JV is the segment length. Equation 5 expresses S(n) as the sum of Af 
damped (complex) exponentials. The parameter determines the initial phase and 
amplitude, while Pm determines the frequency and damping. In order to determiae the 
parameters r^fi and pm for the exponentials the matching pursuit algorithm was used> as 
described in ^'Matching pursuits with time- frequency dictionaries", IEEE Transactions of 
10 Signal Processhxg, Volume 41, pp 3397-341 5, December 1993. Matchmg pursuit 
approximates a signal by a finite ^pansion Into elements chosen from a redundant 
dictionary. Let D - i^^)^^^ be a complete dictionary of unit-norm elements. The matching 

pursuit algorithm is a greedy iterative algorithm which projects a signal j onto the dictionary 
element gy that best matches the signal and subtracts this projection to form a residual signal 
15 to be approximated in tiie next iteratioa Fmding the best matching dictionary element 

consists of computing the hmer products {s, gr> and selecting the element that maximises the 
inner product. In order to find the parameters r«i and pm a dictionary is coDstructed consisting 
of damped exponentials, 

^^,,=C(?^e^^,n = 0,,..,i5:-l (6) 

20 Where the constant c is introduced for having unit-norm dictionary elements, 

and compute the inner products of the residual signal at iteration-w, Sm and the dictionary 
elements defined hi equation 6; 

<^;«»g«.v) = ^2'^.(«>^«-''", (7) 

By doing this for different values of a, the transfer function Sfft(z) is evaliiated 
25 on circles in the complex z-'plane having radius 

The method described above has been experimentally tested and the following 
gives results and discussion of computer simulations and informal list^iing tests performed 
on audio signals. The audio excerpts used were a castanet signal, songs by ABBA, Celme 
Dion, Metalllca and a vocal by Suzanne Vega. The signals were sampled at 44-1 kHz. The 
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11 23.07.2001 
DCT size is 44288 samples (apptoximately 1 second) and the overlap between time domain 
segments is 4410 samples (0.1 seconds). The sinusoidal analysis of the DCT domain signals 
is done using Hanning windows of length 512 samples and mutual overlap of 256 samples. 
The transient component of the signal was estimated and subtracted to form the residual 
5 signal. Next, the transient locations were modified according to a time grid of 220 samples 
(approximately 5 ms). 

It is important to verify that the modification of the transient locatioais does not 
introduce any audible distortion. To check that, the modified transient signal was added to 
the residual signaL The listening tests conducted verified flxat there is no perceptual 

1 0 difference between the fbm obtained signal and the original audio signal. 

In the following^ the improvement due to the modification procedure will be 
illustmted. Also discussed is the performance of a damped sinusoidal model with ttie 
restricted segmentation for an original transient signal (i.e. generally a transient starts at an 
arbitrary location) and the modified transient signal (a transient starts in the begmning of a 

IS segment). The optimal restricted time segmentation (with the minimum segment size of 220 
samples) for damped sinusoids is found using the technique proposed in 'Tlexible tree^ 
structured signal esqpansions using time-yarying wavelet packets" in IEEE Transactlona of 
Signal Processing, Volume 45» pp 333-34S, February 1997. The performance is studied in^ 
terms of signal-to-noise ratio (SNR) versus number of damped sinusoids NDS and is well ^ 

20 illustrated by Figure 1 vAi&cq results are presented £br a pardcular transient of the Castanet 
signal: A represents the original transient and B represents the shifted transient. The 
modification procedure results in a considerably smaller number of damped sinxvsoids needed 
to represent the transient with a certain quality than would previously have been the case. 
Lower plots of Figures 2 and 3» show the reconstmction with 25 damped sinusoids of the 

25 original and the modified transients^ respectively. In these Figures t[ms] denotes time in 
milli-seconds. The original transient is not located in the be^rming of the segment and^ as a 
result, the modelmg error is distributed to samples before the transient. This results in an 
audible pre-echo. On the other hand, the modified transient is located in the beginning of tiie 
segm^ and, as a result, the pre^echo problem is eliminated. 

30 Figure 4 shows a fiow diagram of tiie first embodiment havixig steps SI to S6, 

where: 

SI represents: Estimate the location of transients hi a first thne segment of an input signal, by 
a transformation into the fi:equency domain. 





l@^ai010529EPP 



J* JLU 



1 t I1I-J.1 O L^ll i II- 




020 25.07.20 



23.07.2001 



52 tepre^ents: Modify the location of the transients in the spatial domain by modifying the 

corresponding fiieqnenciea, to locations on a predetermined time scale. 

53 represents: Estimate the location of transients in second and subsequent time segments of 

&e transient signal, by a traxisfonnation into the frequency domain, 
S S4 represents: Modify the location of the transients in the spatial domain by modifying the 
corresponding frequencies^, to locations on a predetermined time scale. 

55 represents: Decompose an audio signal into transient, tonal and noise components. 

56 represents: Recombine the decomposed signal for transmission or playback. 

It may be possible that a similar improvemra.t to ttiat mentioned above would 
10 be achieved in the case of a full- precision variable segmentation (and no sigi^l 

modifrcation)* However, the restricted segmentation and the modification procedure result in 
a much lower total computational cost. Also, less side information is reqioired to describe the 
restricted segmentation. 

A second embodiment of coding method involves a different method of 
1 5 estimating the location of transients in an input signal and a different modification procedure. 
The locations of transients are modified in sudh a way that a transient can only occur at the 
beginning of a sinusoidal segment, v^ch sinusoidal segments are defined by a specified 
segment size, which may be S milliseconds (ms); this is referred to as a restricted 
segmmtation, and corresponds to that of the first embodiment. The reference to a beginning 
20 of a sinusoidal segment can be taken to be a reference to a beginning of a time grid in the first 
embodiment; the reference to a sinusoid simply refits to the modeling procedure used. 

This second embodiment uses the same idea as the first embodiment in that 
transient locations are modified to ixr^rove the modeling of signals^ in particular, audio 
signals. However, this second embodiment provides an improved method of modifying the 
25 location of transients. 

To summarize the first method, the input signal was modified by estimating 
the location of transient components using a model based on the duality between the time and 
frequency domain for the signal; subtracting the transient component; modifying the 
locations of transients such that their beginnings can only occur at the beginnings of 
30 sinusoidal segments and a restricted segmentation; and adding the modified transient to the 
residual signal in order to obtain a modified audio signal. 

In outline, the method of the second embodiment involves detecting the 
beginnings and ends of transient and audio signal using an energy based approach with two 
sliding rectangular whadows, as described in "Audio subband coding with improved 
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f epresentalioa of transient signal segments", fifom proceedings of EUSDPCO, pages 2345- 
2348, Greece 1998, incoipoiated herein by reference; followed by moving the identified 
transients to locations specified by a chosen time grid or sinusoidal segmentation grid; and 
time-waiping parts of the signal between the identified transients in order to fiU the intervals 
between the modified transients. 

The transient detection approach as described in "Audio subband coding with 
improved represeatation of transient signal segments'* mentioned above, is based on the 
evahmHon of the (»iterion fkuction, C(n) : 



10 \SMJ 



n-l 



where w is a time sample* in) and Ej^(n) axe the energies of fee input signal within 
1 5 length-JV rectangular windows on the left- and right-hand side of the time sample n. 

Significant peaks of the criterion function C(n) correspond to the beginnings of transients. 

The end of a transiMt is defined by searching the first value of C(n) after the beginning of a 

transient, which ia just below a certain threshold. 

Once the beginnings and ends of tho transients have been located using fee 
20 above method fee transients are simply removed ftom fee signal and relocated to fee nearest 

location on fee specified sinusoidal segmentation grid> effectively by a cut and paste method. 

This part of fee procedure is particularly straightforward and is easily implemented by fee 

person skilled in fee art. 

As would be appreciated, due to fee modification of the transient locations, fee 
25 distance between two consecutive transients in an audio signal can become longer (e.g. if one 

is shifted forward and fee ofeer is shifted backward), or fee distance can become shorter (e.g. 

if a first transient is shifted backwards aud a seccmdtrana In 

figure 5 examples of transient modification where fee distance is increased is shown, whereas 

m Figure 6, a reduced distance between transients is shown. In order to fill fee interval 
3 0 between the modified transients the signal part in between must be modified in some way to 

allow for fee greater or smaller distance between transients* 
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The signal is modified by timo-waxping, this is done in sucii a way that 
preserves the cotrect amplitudes of the edge points of the signal in between the transients, 
thus there are no discontinuities introduced just before or just after a transient, as described 
below. The time-waxpiag results in the signal between transients being stretched (as shown 
5 in Figure S) or compressed (as shown in Figure 6). To compute the amplitudes at the new 
integer sampling positions based on the known amplitudes of the origmal samples, a band 
limited interpolation method based on sine fimctiona is used (the bandlimited interpolation is 
described in Proakis and Manolakis ^^Digital Signal Processing. Principles, Algorithms and 
Applications'% Prentice-HaU Ihtemational, 1996). Modified Harming window is used. To 
1 0 compute the amplitude of each new sample, amplitudes of eight orig;bal samples are used, 
four at each side of the new sample. 

The stretching or compressmg of a signal results for tonal signals in a 
corresponding chaage of the fundamental frequmoy,>^. The goal of the modification 
procedure is to ensure that llxe induced modifications of /o are not audible. 
15 In order to achieve the modification, the following algorithm is used for time- 

warping Ihe part of the signal between the two identified and modified transients; 
(a) if the required change in length of a signal part in between two transients results in a 
change of J^by no more than 0^%» the signal is simply subjected to a band limited 
interpolation method based on sine functions. This is the example shown in Figures 5a 
20 and 6a. If^ changes by more than 0.2% then follow step b) as described below. 



The reason for the limit of 0,2% is that it has been determined fcom the 
literature on psycho-acoustics that changing/^ of a tonal sound by 0.2% can be audible, as 
described in "An introduction to the psychology of hearing", Academic Press, 1997. Our 
25 own experiments verify this result. 

(b) The signal part is split in between two transients into two non-overlapping intervals; the 
first interval is located directly after the end of the first transient and lasts 10 ms (as 
illustrated by interval 1 in figures 5b and 6b), and the second interval is the remaining 
part, i.e. it lasts until the beginning of the second transient (as shown by interval 2 in 
30 figures 5b and 6b), The lengths of ttie two intervals are modified by a different amount 
If the required change in length of the signal part in between two transients can be done 
by changmg/pin the first interval by no more than 2% and in the second interval by no 
more than 0.2%, then the signal in the two intervals is time-warped correspondingly as 
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shown in the lower parts of figures 5b and 6b, Otherwise go to step c) as described 
below. 

The reasoning behind step b) is that the interval directly after the end of a 
transient is the interval whore the maskmg effect from the transient is strong. Therefore, 
5 larger changes of the signal in this interval are possible before they become audible. Our 
experiments verify that a change of/(»by no more than 2% hi the interval 10 ma dkectly after 
the end of a transient is inaudible. 

(c) time-warp the signal in the two intervals such that the resulting change of is no more 
than 2 % hi the interval 1 and no more than 0-2 % in the int^val 2. If the resulting change 
10 in length is not sufficient to fill the distance between the shifted transients then apply an 
overlap-add procedure witii a modified Hanning window using samples fiom the two 
intervals in order to increase or decrease the length of the signal. To ensure a smooth 
transition between two intervals, the length of the overlap-add region is chosen to be 
larger than required to obtain a correct length of the signal in between two transients 
1 5 (figures Sc and 6c). 

In figures S and 6 the new locations of transient beginnings are depicted vdth 
small arrows. In figure 5 the signal part in between two transients becomes longer. In figure 
6 the signal part in between two transients becomes shorter, hi the lower part of figure 6c a 
small vertical shift is shown for clarity's sake. 
20 Various computer sunulatiions of tiie method of the second embodiment, 

together with informal listening tests with audio signals were carried out. The audio excerpts 
used were castanets, bass^ trumpet^ Celine Dion, MctaUica, harpsichord, Eddie Rabbit, 
Stravmsky and Orff. The signals were sampled at 44.1 kHz. The transient locations were 
modified according to a time grid of 220 samples (approxhnately 5 ms). It is important to 
25 . verify that the modification of transient locations does not introduce any audible distortion. 
The listening tests conducted verified that there is no perceptual difference between the 
original and modified audio signals. 

Next, it was demonstrated that there is an improvement in the modeling of the 
signal due to the modification procedure, A comparison was made between the performance 
30 of a damped sinusoidal model with the restricted segmentation for an original transient signal 
(i.e. generally transient starts at an arbitrary location) and for a modified transient signal (a 
transient starts at the beginning of a segment, as defined by the present method). The lower 
parts of figures 7 and 8 show the reconstruction with 25 damped sinusoids of the original and 
the modified transients, respectively. The original transient is not located at the beginning of 
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the segment, and as a result, the modeling error is distributed to samples before the transient 
This results in an audible pre-echo, shown by the amplitude of the signal and the lower part 
of Figure 7 between 5 ms and approximately 7.5 ms, which is not shown in the upper part of 
the Figure 7 that shows the original transient On the other hand, the modified transient is 
located at the beginning of the segment and, as a result, the pre-echo is elimimtted as 
demonstrated in Figure 8 in that the amplitude of the signal for upper and lower parts of the 
figure moves fircm zero immediately after 5 ms, i.e. both at the same time. 

Figure 9 shows a fiow diagram of the second embodiment having steps Tl to 

T6, where: 

Tl represents: Estimate the location of transients (beguming and end) in a first thne segment 

of an input signal, by an energy based approach. 
T2 represents: Modify the location of the transients by cutting and pasting to locations on a 

predetemiined time scale, and timewarp the signal parts in between. 
T3 represents: Estimate the location of transients (beginniag and end) in second and 

subsequent time segments of tihe input signal. 
T4 represents: Modify the location of the transients as above, and timewarp the signal parts in 

between. 

T5 represents: Decompose the audio signal into transient, tonal and noise components. 
T6 represents: Reccmblne tiie decomposed signal for transmission or playback. 

The method described in the second embodiment provides a more general 
procedure and provides good results, which are an improvement on those of the first 
embodiment The time-waiping principal is based on the knowledge of sound perception and 
the procedure of the second embodiment is less complex to implement and utilize. 

The advantages of the second embodiment over prior art methods and also the 
first embodiment are that the transient detection model is more general and provides good 
results for various transients, not just short transients. Also, the time-warping of the signal 
parts between transients is based on the knowledge of the properties of sound perception, 
such as pitch perception and temporal masking effects. Furthermore, the method of the 
second embodiment results in a significantly lower computational complexity. 

Both of the methods disclosed herein provide a particularly advantageous 
method for coding audio and video signals. In particular, restricting the transient locations 
simplifies the analysis procedure in an audio coder (involving transient* sinusoidal and noise 
models) significantly. Also, the side information associated with the corresponding 
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segmentatioa is leduced because of the restricted segmentation often used in the two 
embodiments desoribod. 

Furtliermoiej the introduced difference in transient locations is not of 

perceptual importance. 

5 The method could be implemented in devices for storing, transmitting, 

receiving, or reproducing audio and/or video, e.g. solid stale audio devices. Figure 10 shows 
an audio codwr 10 and an audio decoder 12 >^ch receive an audio signal (A) for coding and 
a coded signal (C) for decoding respectively, with Ihe decoder 12 outpvtting the audio signal 
A. In particular, the audio coder may be included in a ttansmitting or recording device, 

10 further comprising a source or recdver for obtaining the audio signal 

transmltdng/outputting the coded signal to a transmission medium or a storage medium (e.g. 
a sold state memory)* For stereo audio signals, the time and intensity with which a signal 
reaches both ears play a major role on localization of sounds^ i.e. the perception of direction 
and distance to the sound source. More precisely, it is the di£Ebcence in time (hiteraural time 

1 S difference) and difference in intensity (interaural intensity diff(»^ence) with which the signal 
reaches both ears, which form the so called $tereo image. we deal with time 
modifications of audio signals for the purpose of efficient modeUng. Therefore, below we 
will concentrate our attention on the resulting interaural (interchaxmel) time differences. 

The audibility of interchannel time difference and relative importance of 

20 transients and ongoing parts in formation of stereo image depend upon a variety of factors, 
including duration of sounds^ frequency content, repetition rate (for transients). The 
important result, however, is that interchannel time diffeiences as small as of order of 10 \is 
can be detected by the auditory system (using cues either fix>m transients or ongoing parts). 

When modifying transient locations, also the ongouxg parts are modified due 

25 to the time shift and time warping, i,e, both important cues are present Hierefore, care has to 
be taken for not destroying the original stereo image. 

An ef&clent modeling with damped sinusoids can be obtained if transient 
locations in botix stereo channels are modified such that the transients ?tart at the begirmings 
of the sinusoidal segments. The independent modifications in the two ohaimels would, 

30 however, generally result in a destroyed stereo image, A possible solution to this problem 
could be to modify the transient locations according to the sinusoidal segmentation before 
modeling with damped sinusoids, but to send side information describing the original time 
dififerences between corresponding transients in the two channels to the decoder. The , at the 
decoder the synthesized signal in one of the channels can be unwarped according to the 
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original time difference. As a result, the synthesized transients occur generally at locations 
different from their original locations but the interchannel time difference between the two 
transients is preserved. This solution is especially suitable for highly-correlated stereo 
channelSp having similar detected transients with low interchamiel time differences. 
5 It should be noted that the above-mentioned embodiments illustrate rather than 

limit the invenidon, and that those skilled in the art will be able to design many altemallve 
embodiments without departing firom the scope of the appended claims. In the claims, any 
teference signs placed between parentheses shall not be cox]^tnied as limiting the claim. The 
word ^comprising' does not exclude the presence of other elements or steps than those listed 

10 in a claim. The invention can be implemented by means of hardware comprising sevei^ 
distmct elements, and by means of a suitably programmed computer. In a device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere &ct that certain measures are recited in mutually different 
dependent claims does not mdicate that a combination of these measures cannot be used to 

IS advantage. 

In summary, an improved representation of transients in audio signals 
comprises modifying transient locations in such a way that a transient can occur only at a 
beginning of a sinusoidal segment The modification procedure comprises the steps: 
" detectiz^ a beginning and an end of a transient using an energy-based approach with two 
20 sliding rectangular windows; 

- moving samples between the beginnmg and the end of the transient to the locations 
specified by the segmentation used; and 

- time-warping the signal parts in betwe^ the transients in order to fill the intervals between 
the modified transients. 
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CLAIMS: 



1 . A method of coding an input signal, the method comprising: 

- estimating a location of at least one transient in a time segment of the input signal; 
the method being characterized by 

. modifying the location of the transient so that the transient occurs at a specified location on 
a predetermined time scale to obtain a modified signal; and 
modeling the modified signal. 

2. A method of coding as claimed in claim 1 , in which each transient is relocated 
to a nearest specified location of a plurality of possible locations on the predetermined 
timescale. 

3 . A method of coding as claimed in claim 1 , in which the specified locations on 
the predetermined time scale are defined by integer multiples of a predetermined minimum 
time segment size. 

4. A method of coding as claimed in claim 3, in which the predetermined 
minimum time segment size has a lexigth in the range of approximately 1 millisecond (ms) to 
approximately 9 ms. 

5 . A method of coding as claimed in claim 1 , in which the modeling uses 
sinusoids to represent the modified input signal, 

6. A method of coding as claimed in claim 1 , in which a restricted time 
segmentation is also applied to tonal and/or noise components of the input signal* 

7. A method of coding as claimed in claim 1 , in which the estimation of the 
location of transients is carried out using an energy-based approach* 
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8. A method of coding as claimed in claim 7, in which the estimation of the 

location of transients is carried out using two sliding windows. 



9. A method of coding as claimed in claim 1 » in which the location of transients 
5 involves the location of a beginning and an end of each transient. 

10. A method of coding as claimed in claim 1 » in which each located ti'ansient is 
moved hy a cut and paste method from its original location to hegin at a location on the 
predetemiined time scale. 

10 

11. A method of coding as claimed in claim 10, in which a remaining section of 
the input signal between two located modified transients is time-warped to fill the g£^ 
remaining following the relocation. 

IS 12. A metiiod of coding as claimed in claim 1 1, in which the time-warp is a 

lengthening or a shortming of said remaining section. 

13 . A method of coding as claimed in claim 1 1, in which the time-waiping 
preserves the amplitudes of edge points of the modified signal. 

20 

14. A method of coding as claimed in claim 1 1 , in which the time-warp is carried 
out by interpolation where the change in the fundamental firequency of the remaining section 
is less than approximately 0.3 %. 

25 1 5 . A method of coding as claimed in claim 1 1 , in which, where the change in the 

fundamental fitequency of Ihe remaitxing section is more than or equal to 0.3%, the remaining 
section is split into a first length immediately after the modified traxisient and a second 
length. 



30 16. A method of coding as claimed in claim 15, in which the first length is 

approximately 8 ms to 12 ms. 



17. A metliod of coding as claimed in claim 1 4, in which where the interpolation 

is insufficient to fill a gap in the remaining section, and overlap-add procedure is used. 
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1 A method of coding as claimed in claim 1 , in which the modification of the 

location of the or each transient is performed using a transfoimation into a ftequency domain. 



5 19. A method of coding as claimed in claim 1 , wherein the method comprises 

including side information in the modeled modified signal, which side infoimation describes 
an original time difference between corresponding transients in at least two channels. 



20. A method of decoding comprising receiving a modeled modified signal in 

10 which a location of transients in at least two channels has been modified, tiic modeled 
modified signal further comprising side information describing an original time difference 
between conresponding transients^ the method comprising: 

synthesizing a synthesized signal for the at least two channels, and 
unwaiping the synthesized signal according to the original time difference. . 
15 Modeled modified signal in which a location of transients in at least two 

channels has been modified, tiie signal further comprising side information describing an ; 
original time difference between corresponding transients in the at least two chaimels. 

22. Storage medium on which a modeled modified signal as claimed in claim 21 

20 has been stored* 



23 . Decoder comprising: 

means for receiving a modeled modified signal in which a location of 
transients in at least two channels has been modified* file signal further comprising side 
25 information describiug an original time difference between corresponding transients m the at 
least two channels, and 

means for synthesizh^ a synthesized signal for the at least two channels, and 
unwarping the synthesizing signal according to the original time difference. 

30 24. Audio player comprising a decoder as claimed in claim 23 and a reproducfion 

unit for reproducing the unwarped synthesized signal. 

25, Apparatus (10) for coding signals comprises an electronic processor operable 

to: 
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- estimate the location of one or more transients in a time segmmt of an audio or video 
signal; 

characterized hy the processor being operable to modify the location of the or each transient 
so that the or each transient occurs at a specified location on a predetextnined time scale, and 
5 the processor is operable to model the modified input signal, 

26. Apparatus (10) as claimed in claim 19« v^hich is an audio device. 
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ABSXRACT: 



An improved representation of transients in auclio signals comprises modifying 
transient locations in such a way that a transient can occur only at a beginning of a sinusoidal 
segment. The modification procedure comprises tiie steps: 

- detecting a beginning and an end of a transira^t using an energy^based approach with two 
5 sliding rectangular windows; 

- mo\dng samples between the beginning and the end of the transient to the locations 
specified by the segmentation used; and 

- time-waiping the sigixal parts in between the transients in order to fill the intervals between 
the mcxiified transients, 

10 

(Figure 9) 
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